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p |. Abstract 

(yQ ' Modern communication receiver architectures center around digital signal processing (DSP), with 

,_i ' the bulk of the receiver processing being performed on digital signals obtained after analog-to-digital 

V , conversion (ADC). In this paper, we explore Shannon-theoretic performance limits when ADC precision 

Y^ i is drastically reduced, from typical values of 8-12 bits used in current communication transceivers, to 

1-3 bits. The goal is to obtain insight on whether DSP-centric transceiver architectures are feasible as 
" ' I communication bandwidths scale up, recognizing that high-precision ADC at high sampling rates is 

CN , either unavailable, or too costly or power-hungry. Specifically, we evaluate the communication limits 

t^ . 

T-H ■ imposed by low-precision ADC for the ideal real discrete-time Additive White Gaussian Noise (AWGN) 

T-H ■ 

• ■ channel, under an average power constraint on the input. For an ADC with K quantization bins (i.e., a 

'nI" ' 

^^ ' precision of log2 K bits), we show that the Shannon capacity is achievable by a discrete input distribution 

00 ; 

f^ , with at most A + 1 mass points. For 2-bin (1-bit) symmetric ADC, this result is tightened to show 

^ . that binary antipodal signaling is optimum for any signal-to-noise ratio (SNR). For multi-bit ADC, the 

^^ ■ capacity is computed numerically, and the results obtained are used to make the following encouraging 

H ' 

Cd ' observations regarding system design with low -precision ADC : (a) even at moderately high SNR of 

up to 20 dB, 2-3 bit quantization results in only 10-20% reduction of spectral efficiency, which is 

acceptable for large communication bandwidths, (b) standard equiprobable pulse amplitude modulation 

with ADC thresholds set to implement maximum likelihood hard decisions is asymptotically optimum 

at high SNR, and works well at low to moderate SNRs as well. 
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Index Terms 

Channel Capacity, Optimum Input Distribution, AWGN Channel, Analog-to-Digital Converter. 

I. Introduction 

Digital signal processing (DSP) forms the core of modem digital communication receiver 
implementations, with the analog baseband signal being converted to digital form using Analog- 
to-Digital Converters (ADCs) which typically have 8-12 bits of precision. Operations such 
as synchronization, equalization and demodulation are then performed in the digital domain, 
greatly enhancing the flexibility available to the designer. The continuing exponential advances 
in digital electronics, often summarized by Moore's "law" [[T|, imply that integrated circuit 
implementations of such DSP-centric architectures can be expected to continue scaling up in 
speed and down in cost. However, as the bandwidth of a communication system increases, 
accurate conversion of the analog received signal into digital form requires high-precision, 
high-speed ADC, which is costly and power-hungry |l2l. One possible approach for designing 
such high-speed systems is to drastically reduce the number of bits of ADC precision (e.g., 
to 1-3 bits) as sampling rates scale up. Such a design choice has significant implications 
for all aspects of receiver design, including carrier and timing synchronization, equalization, 
demodulation and decoding. However, before embarking on a comprehensive rethinking of 
the communication system design, it is important to understand the fundamental limits on 
communication performance imposed by low-precision ADC. In this paper, we take a first step in 
this direction, investigating the Shannon-theoretic performance limits for the following idealized 
model: linear modulation over a real baseband Additive White Gaussian Noise (AWGN) channel 
with symbol rate Nyquist samples quantized by a low-precision ADC. This induces a discrete- 
time memoryless AWGN-Quantized Output channel, which is depicted in Figure [T] Under an 
average power constraint on the input power, we obtain the following results 

1) For a fT-level (i.e., logg K bits) output quantizer, we prove that the input distribution need 
not have any more than K + 1 mass points to achieve the channel capacity. (Numerical 
computation of optimal input distributions reveals that K mass points are sufficient.) An 
intermediate result of interest is that, when the AWGN channel output is quantized with 
finite-precision, an average power constraint leads to an implicit peak power constraint, in 
the sense that an optimal input distribution must now have bounded support. 
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Fig. 1. Y = Q{X + N) : The AWGN -Quantized Oupiit channel induced by the output quantizer Q. 



2) For 1-bit symmetric quantization, the preceding result can be tightened to show that binary 
antipodal signaling is optimal for any signal-to-noise ratio (SNR). 

3) For multi-bit quantizers, tight upper bounds on capacity are obtained using a dual formula- 
tion of the capacity problem. Near-optimal input distributions that approach these bounds 
are computed using the cutting -plane algorithm [[3T1l . 

4) While the preceding results optimize the input distribution for a fixed quantizer, comparison 
with an unquantized system requires an optimization over the choice of the quantizer as 
well. We numerically obtain optimal 2-bit and 3-bit symmetric quantizers. 

5) From our numerical results, we infer that low-precision ADC incurs a relatively small 
loss in spectral efficiency compared to unquantized observations. For example, 2-bit ADC 
achieves 95% of the spectral efficiency attained with unquantized observations at dB 
SNR. Even at a moderately high SNR of 20 dB, 3-bit ADC achieves 85% of the spectral 
efficiency attained with unquantized observations. This indicates that DSP-centric design 
based on low-precision ADC is indeed attractive as communication bandwidths scale up, 
since the small loss in spectral efficiency should be acceptable in this regime. Further- 
more, we also observe that a "sensible" choice of standard equiprobable pulse amplitude 
modulated (PAM) input with ADC thresholds set to implement maximum likelihood (ML) 
hard decisions achieves performance which is quite close to that obtained by numerical 
optimization of the quantizer and input distribution. 



Related Work 

For a Discrete Memoryless Channel (DMC), Gallager first showed that the number of input 
points with nonzero probability mass need not exceed the cardinality of the output [|3l p. 96, 
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Corollary 3]. In our setting, the input alphabet is not a priori discrete, and there is a power 
constraint, so that the result in 0| does not apply. Our key result on the achievability of the 
capacity by a discrete input is actually an extension of a result of Witsenhausen in ||4l, where 
Dubins' theorem fSl was used to show that the capacity of a (discrete-time, memoryless and 
stationary) channel with K output levels, under a peak power constraint is achievable by a discrete 
input with at most K points. The key to our proof is to show that under output quantization, 
an average power constraint implies an implicit peak power constraint, after which we can use 
Dubins' theorem in a manner similar to the development in flU. 

Prior work on the effect of reduced ADC precision on channel capacity with fixed input 
distribution includes |l6l, [|7]|, |[8l. However, other than our own preliminary results reported in 
flU, [[Toll , we are not aware of a Shannon-theoretic investigation with low-precision ADC that 
includes optimization of the input distribution. 

While we are interested in fundamental limits here, a strong motivation for this work comes 
from emergent applications in high-bandwidth, multiGigabit, unlicensed wireless communication 
systems using Ultrawideband (UWB) communication in the 3-10 GHz band [11], and millimeter 
wave communication in the 60 GHz band [[T2[| . Indeed, there has been prior exploration of the 
impact of low-precision ADC in the specific context of UWB systems. Low power transceiver 
architectures for UWB systems have been proposed in |[T3l . [[T4|. The performance of UWB 
receivers using 1-bit ADC has been analyzed in |[T5l . including the use of dither and oversam- 
pling. The effect of ADC precision on UWB performance is considered in iT6l . Decomposition 
of the UWB signal into parallel frequency channels in order to relax ADC speed requirements 
is considered in |[T7[| . [[T8[| . Demodulation and interference suppression techniques for UWB 
communication using 1-bit ADC have been proposed in |[T9[| . 

Given the encouraging results here, it becomes important to explore the impact of low-precision 
ADC on receiver tasks such as synchronization and equalization, which we have ignored in our 
idealized model (essentially assuming that these tasks have somehow already been accomplished). 
Related work on estimation using low-precision samples which may be relevant for this purpose 
includes the use of dither for signal reconstruction [[20[| . [[2T1l . |[22l . frequency estimation using 
1-bit ADC [[23l . [[24[| . choice of quantization threshold for signal amplitude estimation |[25l . and 
signal parameter estimation using 1-bit dithered quantization |[26[| . |[27l . 
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Organization of the Paper 

The rest of the paper is organized as follows. The AWGN-Quantized Output channel model is 
described in the next section. In Section [nil we show the existence of an implicit peak power 
constraint, and use it to prove that the capacity is achievable by a discrete input distribution. 
Section |IV] presents capacity computations, including duality-based upper bounds on capacity. 
Quantizer optimization is considered in Section |Vl followed by the conclusions in Section |Vll 

IL Channel Model 

We consider linear modulation over a real AWGN channel, with symbol rate Nyquist sam- 
ples quantized by a A' -bin (or i^-level) quantizer Q. This induces the following discrete-time 
memory less AWGN-Quantized Output (AWGN-QO) channel 

Y = Q{X + N) . (1) 

Here X G M is the channel input with cumulative distribution function F{x), Y E {yi, ■ ■ ■ , yx} 
is the (discrete) channel output, and N is A/'(0, cr^) (the Gaussian random variable with mean 
and variance a^). Q maps the real valued input X + A^ to one of the K bins, producing a discrete 
output Y. In this work, we only consider quantizers for which each bin is an interval of the real 
line. The quantizer Q with K bins is therefore characterized by the set of its {K — 1) thresholds 
Q '■= [Qi, q'2, ■ ■ ■ , qx-i] e ]R^'^^\ such that -oo := go < gi < g2 < ■ ■ • < Qk-i < Qk ■= oo. The 
output Y is assigned the value y-i when the quantizer input {X + N) falls in the i*^ bin, which 
is given by the interval {qi-i,qi]. The resulting transition probability functions are 

W,{x) = P{Y = y,\X = x)=Q (^^^^) - Q (^) ' 1 < ^ < ^, (2) 

where Q{x) denotes the complementary Gaussian distribution function 

1 f°° 

Q{x) = ^= / exp(-tV2)rft . 
V 27r Jx 

The Probability Mass Function (PMF) of the output Y, corresponding to the input distribution 
F is 

POO 

R{yi- F) = / Wi{x)dF{x), l<i<K, (3) 
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and the input-output mutual information I{X; Y), expressed explicitly as a function of F is 

i{F) = rf2 ^^(^) i°g ^r^^^(^) (4) 

Under an average power constraint P on the channel input (i.e., E[X^] < P), we wish to 
compute the capacity of the channel ©, which is given by 

C = sup /(F), (5) 

where JF is the set of all distributions on M that satisfy the average power constraint, i.e., 

J^= \F: [ x^dF{x) <P\ . (6) 



III. Discrete Input Distribution Achieves Capacity 

We first employ the Karush-Kuhn-Tucker (KKT) optimality condition to show that, even 
though we have not imposed an explicit peak power constraint on the input, it is automatically 
induced by the average power constraint. Specifically, an optimal input distribution must have a 
bounded support set. This is then used to show that the capacity is achievable by a discrete input 
distribution with at most A' + 1 mass points. Note that our result does not, however, guarantee 
that the capacity is achieved by a unique input distribution. 

A. An Implicit Peak Power Constraint 

Using convex optimization principles, the following necessary and sufficient KKT optimality 
condition can be derived for our problem, in a manner similar to the development in [|29ll . [[30ll . 
An input distribution F is optimal for ([5]) if and only if there exists a 7 > such that 

? W^^(^) log -^^^ + 7(^ - ^') < I^F) (7) 

for all X, with equality if x is in the support of F El, where the transition probability functions 
Wi{x), and the output PMF R{yi\ F) are as specified in (O and Q, respectively. 

The first term on the left hand side of the KKT condition is the KuUback-Leibler divergence 
(or the relative entropy), D{W{-\x)\\R{-] F)), between the transition and the output distributions. 

'The logarithm is base 2 throughout the paper, so the mutual information is measured in bits. 
The support of F (or the set of increase points of F) is the set Sx{F) = {x : F{x + e) — F(x — e) > 0, Ve > 0}. 
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For convenience, let us denote it by d{x; F). We first study tlie behavior of this function in the 
limit as X ^ cxD. 

Lemma 1: For the AWGN-QO channel © with input distribution F, the divergence function 
d{x; F) satisfies the following properties 

(a) Mm d{x-F) = - log R{yK;F). 

x—*oo 

(b) There exists a finite constant Aq such that V x > Aq, d{x; F) < — log R{yK] F). 

Proof: We have 

K K 

= J2 W^{x) log{W,{x)) - J2 Wi{x) log{R{yf, F)) . 



i=l i=l 

,2 



For any finite noise variance a , as x ^ cxd, the conditional PMF Wi{x) tends to the unit 
mass at z = i^. This observation, combined with the fact that the entropy of a finite alphabet 
random variable is a continuous function of its probability law, gives 

lim d{x;F)=0-log{R{yK;F)) = -log{R{yK; F)) . 

x—>oo 

To prove part (b), we pick Aq to be such that Wi{Ao) < R{yi] F) for i = {1,2, ...,K — 1}, 
and also that Wk{Aq) > R{yK]F). Such an Aq always exists because for x > Qk-i, the 
transition probabilities Wi{x) -^ and are strictly monotone decreasing functions of x for 
i = {1, ..., K — 1}, while Wk{x) — > 1 and is a strictly monotone increasing function of x (the 
strict monotonicity is easy to see by evaluating the derivatives of the transition probabilities). 
With such a choice of Aq, we get that for x > Aq, 

di^;F)^tmi.)iog^^ 



< WKix) log -^^, < - log{R{yK; F)). 
RiVK] F) 



Using Lemma 1, we now prove the main result of this subsection. 

Proposition 1: For the average power constrained AWGN-QO channel O, an optimal input 
distribution must have bounded support. 
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Proof: Let us assume that the input distribution F* achieves |j the capacity in ([5]), i.e., 
I{F*) = C. Let 7* > denote a corresponding optimal Lagrange parameter, so that the KKT 
condition is satisfied. In other words, with 7 = 7*, and, F = F* , ^ must be satisfied with an 
equality at every point in the support of F* . We exploit this necessary condition next to show 
that the support of F* is upper bounded. Specifically, we prove that there exists a finite constant 
A2* such that it is not possible to attain equality in (|7]) for any x > A2 ■ 

Using Lemma 1, we first let lim d{x; F*) = —\og{R{yK'iF*)) = L, and also assume that 

X'— >oo 

there exists a finite constant Aq such that V a; > Aq, d{x] F*) < L. 
We consider two possible cases. 

. Case 1: 7* > 0. 
If C > L + 7*P, then pick A2* = Aq. 



Else pick A2* > max{Ao, y/{L + ^P - C)/'y*}. 

In either situation, for x > A2*, we get d{x; F*) < L, and, 7*x^ > L + 7*P - C. 

This gives 

d{x] F*) + 7*(P -x^) <L + -f*P -{L + -f*P -C) = C. 

. Case 2: 7* = 0. 
Putting 7* = in the KKT condition ©, we get 

Thus, 

L = lim d{x; F*) < C. 

Picking A2* = Aq, we therefore have that for x > A2* 

d{x; F*) + 7*(P - x^) = d{x; F*) < L. 
^d{x]F*)+-f*{P-x^) <C. 

Combining the two cases, we have shown that the support of the distribution F* has a finite 
upper bound A2*. Using similar arguments, it can easily be shown that the support of F* has a 
finite lower bound Ai* as well, which implies that F* has a bounded support. ■ 

That there exists an input which achieves the supremum in ([5} is shown in Appendix I. 
DRAFT 



B. Achievability of Capacity by a Discrete Input 

In flU, Witsenhausen considered a stationary discrete-time memoryless channel, with a contin- 
uous input X taking values on the compact interval \Ai, A2] C M, and a discrete output Y of finite 
cardinality K. It was shown that if the channel transition probability functions are continuous 
(i.e., Wi{x) is continuous in x, for each i = 1, ■ ■ ■ ,K), then the capacity is achievable by a 
discrete input distribution with at most K mass points. As stated in Theorem 1 below (proved in 
Appendix II), this result can be extended to show that, if an additional average power constraint 
is imposed on the input, the capacity is then achievable by a discrete input with at most A' + 1 
mass points. 

Theorem 1: Consider a stationary discrete-time memoryless channel with a continuous input 
X that takes values in the bounded interval [Ai, A2], and a discrete output Y G {yi, 1/2, ■ ■ ■ ^Vk}- 
Let the channel transition probability function Wi{x) = P{Y = yi\X = x) he continuous in x 
for each i, where I < i < K. The capacity of this channel, under an average power constraint 
on the input, is achievable by a discrete input distribution with at most K + 1 mass points. 
Proof: See Appendix II. ■ 

Theorem 1, coupled with the implicit peak power constraint derived in the previous subsection 
(Proposition 1), gives us the following result. 

Proposition 2: The capacity of the average power constrained AWGN-QO channel © is 
achievable by a discrete input distribution with at most K + 1 points of support. 

Proof: Using notation from the last subsection, let F* be an optimal distribution for (|5]), 
with the support of F* being contained in the bounded interval [y4i*,y42*]. Define J-'i to be 
the set of all average power constrained distributions F whose support SxiF) is contained in 
[Ai*,A2*], i.e., 

J-i = {F G ^ : Sx{F) C [A,*, A2*]} , (8) 

where JF is the set of all average power constrained distributions on M, as defined in Q. Note 
that F* E J-'i C T . Consider the maximization of the mutual information I{X\Y) over the set 

Ci = max/(F). (9) 

Since the transition probability functions in ^ are continuous in x. Theorem 1 implies that a 
discrete distribution with at most K + I mass points achieves the maximum Ci in ^. Denote 
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such a distribution by Fi. However, since F* achieves the maximum C in © and F* E J-'i, it 
must also achieve the maximum in ^. This implies that Ci = C, and that Fi is optimal for 
(HI), thus completing the proof. ■ 

C. Symmetric Inputs for Symmetric Quantization 

For our numerical capacity computations ahead, we assume that the quantizer Q employed in 
([U) is symmetric, i.e., its threshold vector q is symmetric about the origin. Given the symmetric 
nature of the AWGN noise and the power constraint, it seems intuitively plausible that restriction 
to symmetric quantizers should not be suboptimal from the point of view of optimizing over the 
quantizer choice in ([T]), although a proof of this conjecture has eluded us. However, once we 
assume that the quantizer in ([T]) is symmetric, we can restrict attention to only symmetric input 
distributions without loss of optimality, as stated in the following lemma. 

Lemma 2: If the quantizer in ([T]) is symmetric, then, without loss of optimality, we can 
consider only symmetric input distributions (i.e., F{x) = 1 — F(— x), V x G M) for the capacity 
computation in (|5]). 

Proof: Suppose we are given an input distribution F{x) that is not necessarily symmetric. 
Consider now the following symmetric mixture distribution 

^^^^ ^ F(x) + l-F(-x) _ 



This mixture can be achieved by choosing distribution F{x) or 1 — F(— x) with probability 1/2 
each. If we use F(x) in place of F(x), the conditional entropy H(Y\X) remains unchanged 
due the symmetric nature of the noise A^ and the quantizer. However, the output entropy H(Y) 
changes as follows. Suppose that, when F{x) is used, the PMF of F is a = [ai, ...,aM]- Then 
under 1 — F{—x) it is a = [om, •••7 oi]- Hence under F{x), the output Y has the mixture PMF 
a = ^{a + a). Since entropy is a concave function of the PMF, 

H{Y)\ _ > ^ ^'^~° + ^ ^'^-'^ = HiY)\^ . 

It follows that under the symmetric distribution F{x), I{X]Y) = H(Y) — H{Y\X) is greater 
than that under F{x), which proves the desired result. ■ 
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IV. Capacity Computation 

We now consider capacity computation for the AWGN-QO channel. We first provide an explicit 
capacity formula for the extreme scenario of 1-bit symmetric quantization, and then discuss 
numerical computations for multi-bit quantization. 



A. 1-bit Symmetric Quantization : Binary Antipodal Signaling is Optimal 
With 1-bit symmetric quantization, the channel is 

F = sign(X + A^). 



(10) 



Proposition 2 (section IIII-B|) guarantees that the capacity of this channel is achievable by a 
discrete input distribution with at most 3 points. This result is further tightened by the following 
theorem that shows the optimality of binary antipodal signaling for all SNRs. 

Theorem 2: For the 1-bit symmetric quantized channel model (fTOl) . the capacity is achieved 
by binary antipodal signaling and is given by 

P 

SNR = — , 



c = i-h(Q(Vsm 

where h(p) is the binary entropy function 

h{p) = —p\og{p) — (1 — p) log(l — p) , < p < 1. 



Proof: Since Y is binary it is easy to see that 



HiY\X) =E hlQ 
where E denotes the expectation operator. Therefore 



X 



a 



I{X,Y) = H{Y)-E 



h{Q 



X 



a 



which we wish to maximize over all input distributions satisfying E[X^] < P. Since the quantizer 
is symmetric, we can restrict attention to symmetric input distributions without loss of optimality 
(cf. Lemma 2). On doing so, we obtain that the PMF of the output Y is also symmetric (since 
the quantizer and the noise distribution are already symmetric). Therefore, H(Y) = 1 bit, and 
we obtain 



C 



min E 

X symmetric 

E[x2l<P 



hiQ 



X 



a 
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Since h{Q{z)) is an even function, we get that 



H{Y\X) =E 



h[Q[^ 



E 



Hg(^ 



In Appendix III, we show that the function h{Q{y/y)) is convex in y. Thus, Jensen's inequality 
[132]| implies that 

H{Y\x) >hfQ(Vsm)) 

with equality iff X"^ = P. Coupled with the symmetry condition on X, this implies that binary 
antipodal signaling achieves capacity and the capacity is 

c = I - h (q (Vsm)]. 



B. Multi-Bit Quantization 

We now consider K-level quantization, where K > 2. It appears unlikely that closed form 
expressions for optimal input and capacity can be obtained, due to the complicated expression 
for mutual information. We therefore resort to the cutting-plane algorithm OTl Sec IV-A] to 
generate optimal inputs numerically. For channels with continuous input alphabets, the cutting- 
plane algorithm can, in general, be used to generate nearly optimal discrete input distributions. 
It is therefore well matched to our problem, for which we already know that the capacity is 
achievable by a discrete input distribution. It is worth mentioning that discretized Blahut-Arimoto 
type algorithms to compute the capacity of infinite input finite (infinite)-output channels have 
earlier been reported in ll43l . although they do not incorporate an average power constraint on 
the input. 

We fix the noise variance cr^ = 1, and vary the power P to obtain capacity at different SNRs. 
To apply the cutting-plane algorithm, we take a fine quantized discrete grid on the interval 
[— lOV^, lOv^], and optimize the input distribution over this grid. Note that Proposition 1 
(Section IIII-AI) tells us that an optimal input distribution for our problem must have a bounded 
support, but it does not give explicit values that we can use directly in our simulations. However, 
on employing the cutting-plane algorithm over the interval [— 10a/P, 10 VP], we find that the 
resulting input distributions have support sets well within this interval. Moreover, increasing the 
interval length further does not change these results. 
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The input distributions generated by the cutting-plane algorithm are shown in our numerical 
results. We find that these distributions have support set cardinality less than K + 1 as predicted 
by Proposition 2. The optimality of these distributions can further be verified by comparing the 
mutual information they achieve with easily computable tight upper bounds on the capacity. The 
computation of these upper bounds is discussed next. 

1) Duality-Based Upper Bound on Channel Capacity: In the dual formulation of the channel 
capacity problem, we focus on the distribution of the channel output, rather than that of the 
input. Specifically, assume a channel with input alphabet X, transition law W{y\x), and an 
average power constraint P. Then, for every choice of the output distribution R{y), we have the 
following upper bound on the channel capacity C 

C <U{R) = mi-as\xp[D{W{-\x)\\R{-)) + ^{P-x^)] , (11) 

where 7 is a Lagrange parameter, and D(W {■\x)\\R{-)) is the divergence between the transition 
and output distributions. While [[33l provides this bound for a Discrete Memory less Channel 
(DMC), its extension to continuous alphabet channels has been established in ll34l . Il35l . A 
detailed perspective on the use of duality-based upper bounds can be found in [36] . 

For an arbitrary choice of R{y), the bound (fTTI) might be quite loose. Therefore, to obtain a 
tight upper bound, we may need to evaluate (fTTI) for a large number of output distributions and 
pick the minimum of the resulting upper bounds. This could be tedious in general, especially 
if the output alphabet is continuous. However, for the channel model we consider, the output 
alphabet is discrete with small cardinality. For example, for 2-bit quantization, the space of all 
output distributions is characterized by a set of just 3 parameters in the interval (0, 1). This makes 
the dual formulation attractive, since we can easily obtain a tight upper bound on capacity by 
evaluating the upper bound in (fTTI) for different choices of these parameters. 

Next, we discuss computation of the upper bound (fTTI) for our problem, for a fixed output 
distribution R{y). 

Computation of the Upper Bound: For convenience, we denote d{x) = D{W{-\x)\\R{-)), 
and g{x, 7) = d{x) + 'j{P — x"^), so that we need to compute min sup g{x, 7). Consider first the 
maximization over x, for a fixed 7. Although the input alphabet X is the real line M, from a prac- 
tical standpoint, we can restrict attention to a bounded interval [Mi,M2] while performing this 
maximization This is justified as follows. From Lemma 1, we know that lim d{x) = log ■ 



Riy 



K 
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The saturating nature of d(x), coupled with the non-increasing nature of 7(P — x^), implies that 
for all practical purposes, the search for the supremum of d{x) + 'j{P — x^) over x can be 
restricted to a; < M2, where M2 is large enough to ensure that the difference \d{x) — log ^/ J 
is negligible for x > M2. In our simulations, we take M2 = qK^i + 5a, where Qk-i is the largest 
quantizer threshold, and cr^ is the noise variance. This choice of A/2 ensures that for x > M2, 
the conditional PMF Wi{x) is nearly the same as the unit mass at i = K, which consequently 
makes the difference between d{x) and log tjt-^ negligible for x > M2, as desired. Similarly, 
the search for the supremum over x can also be restricted to x > Mi = qi — 5a, where qi is 
the smallest quantizer threshold. Note that if the quantizer and the output distribution R{y) are 
picked to be symmetric, then the function g{x, 7) is also symmetric in x, so that we can further 
restrict attention to [0, M2]. 

We now need to compute min max {g{x, 7)}. To do this, we quantize the interval [Mi, M2] 
to generate a fine grid {xi, X2, ■ ■ ■ , xj}, and approximate the maximization over x E [Mi, M2] as 
a maximization over this quantized grid. This reduces the computation of the upper bound to com- 
puting the function min max g{xi,'j). Denoting rj(7) := g{xi,'j), this becomes min max rj(7). 

7>0 l<i</ 7>0 l<i<I 

Hence, we are left with the task of minimizing (over 7) the maximum value of a finite set of 
functions of 7, which in turn can be done directly using the standard Matlab tool fminimax. 
Moreover, we note that the function being minimized over 7, i.e. m{'j) := max rj(7), is convex 
in 7. This follows from the observation that each of the functions rj(7) = d{xi) + ^{P — Xj^) is 



convex in 7 (in fact, affine in 7), so that their pointwise maximum is also convex in 7 11371 pp. 
81]. The convexity of 771(7) guarantees that fminimax provides us the global minimum over 7. 

2) Numerical Results: We now compare numerical results obtained using the cutting-plane 
algorithm with capacity upper bounds obtained using the preceding dual formulation. We fix the 
choice of quantizer to 2-bit symmetric quantization, in which case the quantizer is characterized 
by a single parameter q, with the quantizer thresholds being {—q, 0, q} . The results depicted in 
this section are for the particular quantizer choice q = 2. 

The input distributions generated by the cutting-plane algorithm at various SNRs (setting 
cr^ = 1) are shown in Figure [2l and the mutual information achieved by them is given in Table 
m As predicted by Proposition 2 (section IIII-BI) . the support set of the input distribution (at each 
SNR) has cardinality < 5. 
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Fig. 2. Probability Mass Function of the optimal input generated by the cutting-plane algorithm at various SNRs, for the 2-bit 
symmetric quantizer with thresholds { — 2,0,2}. 
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TABLE I 

Duality-based upper bounds on channel capacity compared with the mutual information (MI) achieved 

BY THE DISTRIBUTIONS GENERATED USING THE CUTTING-PLANE ALGORITHM. 



For upper bound computations, we evaluate ([TT]) for different symmetric output distributions. 
For 2-bit quantization, the set of symmetric outputs is characterized by just one parameter 
a G (0, 0.5), with the probability distribution on the output being {0.5 — a, a, a, 0.5 — a}. We 
vary a over a fine discrete grid on (0, 0.5), and compute the upper bound for each value of a. 
The least upper bound achieved thus, at a number of different SNRs, is shown in Table |I] 

From the results, we see that the input distributions generated by the cutting-plane algorithm 
are nearly optimal, since they nearly achieve the capacity upper bound at every SNR. It is also 
insightful to look at the KKT condition for these input distributions. For instance, consider an 
SNR of 5 dB, for which the input distribution generated by the cutting-plane algorithm has 
support set {—2.86, —0.52, 0.52, 2.86} and achieves a mutual information of 0.8668 bits. Figure 
|3] plots the function 5'(x,7) (i.e., the left hand side of the KKT condition ([7])) for this input. 
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Fig. 3. KKT condition confirms the optimality of the input distribution generated by the cutting-plane algorithm. 

with 7 = 0.1530. We see that 5'(a;, 7) equals the mutual information at points in the support set 
of the input distribution, and is less than the mutual information everywhere else. The sufficient 
nature of the KKT condition therefore confirms the optimality of this input distribution. Note 
that we show the plot for x > only because 5'(a;, 7) is symmetric in x. 

V. Optimization Over Quantizer 

Till now, we have addressed the problem of capacity computation with a fixed output quantizer. 
The cutting-plane algorithm can be used to do this computation. In this section, we consider 
quantizer optimization, and numerically obtain optimal 2-bit and 3-bit symmetric quantizers. 

A Simple Benchmark: While an optimal quantizer, along with a corresponding optimal input 
distribution, provides the absolute communication limits for our model, we do not have a simple 
analytical characterization of their dependence on SNR. From a system designer's perspective, 
therefore, it is of interest to also examine suboptimal choices that are easy to adapt as a 
function of SNR, as long as the penalty relative to the optimal solution is not excessive. 
Specifically, we take the following input and quantizer pair to be our benchmark strategy : for 
a K-lewel quantizer, consider equiprobable, equispaced i^-PAM (Pulse Amplitude Modulation), 
with quantizer thresholds chosen to be the mid-points of the input mass point locations. That is, 
the quantizer levels correspond to the ML hard decision boundaries. Both the input mass points 
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and the quantizer thresholds have a simple, well-defined dependence on SNR, and can therefore 
be adapted easily at the receiver based on the measured SNR. An explicit expression for the 
mutual information of our benchmark scheme is easy to compute. We can also obtain insight 
from the following lower bound on the mutual information, which is a direct consequence of 
Fano's inequality |l32l pp. 37]. 



HsiXlY) < h{P,) + Pe logs {K - 1) . 

=^ Ib{X; Y) > log2 (K) - h{P,) - Pelog2 {K - 1) , 

where h{-) is the binary entropy function, and the subscript B denotes the benchmark choice. 
The probability of error Pg with the ML decisions is 

K - 



P 



e 




where Q{-) is the complementary Gaussian distribution function. 

It is evident that as SNR ^ oo, Pg — *> 0, so that /b(X; Y) -^ log2(-ft') bits. This implies that 
the uniform PAM input with mid-point quantizer thresholds is near-optimal at high SNR. The 
issue to investigate therefore is how much gain an optimal quantizer and input pair provides 
over this benchmark at low to moderate SNR. Note that, for 1-bit symmetric quantization, the 
benchmark input corresponds to binary antipodal signalling, which has already been shown to 
be optimal for all SNRs. 

As before, we set the noise variance a^ = 1 for convenience. Of course, the results are scale- 
invariant, in the sense that if both P and cr^ are scaled by the same factor R (thus keeping the 
SNR unchanged), then there is an equivalent quantizer (obtained by scaling the thresholds by 
\/R) that gives identical performance. 

Numerical Results 
A. 2-Bit Symmetric Quantization 

A 2-bit symmetric quantizer is characterized by a single parameter q, with the quantizer 
thresholds being {— g, 0, q]. We therefore employ a brute force search over q to find an optimal 2- 
bit symmetric quantizer. In Figure IH we plot the variation of the channel capacity as a function of 
the parameter q at various SNRs. Based on our simulations, we make the following observations 
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3 4 5 

Quantizer threshold 'q' 



Fig. 4. 2-bit symmetric quantization : channel capacity (in bits per cliannel use) as a function of the quantizer threshold q 
(noise variance assumed constant). 



• For any SNR, there is an optimal choice of q which maximizes capacity. For the benchmark 



quantizer (which is optimal at high SNR), q scales as ySNR, hence it is not surprising to 
note that the optimal value of q we obtain increases monotonically with SNR at high SNR. 

• The plots show that the capacity varies quite slowly as a function of q. This is because of 
the small variations in the channel transition probabilities (O as a function of q. 

• For any SNR, it is observed that, as g — > or g ^ oo, we approach the same capacity as 
with 1-bit symmetric quantization (not shown for g ^ oo in the plots for 10 and 15 dB in 
Figure 131). This conforms to intuition: g = reduces the 2-bit quantizer to a 1-bit quantizer, 
while g ^ oo renders the thresholds at — g and g ineffective in distinguishing between two 
finite valued inputs, so that only the comparison with the quantizer threshold at yields 
useful information. 

Comparison with the Benchmark: Table |Il] compares the performance of the preceding optimal 
solutions with the benchmark scheme. The capacity with 1-bit symmetric quantization is also 
shown for reference. In addition to being nearly optimal at moderate to high SNRs, the benchmark 
scheme performs fairly well at low SNRs as well. For instance, even at -10 dB SNR, which might 
correspond to a UWB system designed for very low bandwidth efficiency, it achieves 86% of 
the capacity achieved with optimal choice of 2-bit quantizer and input distribution. On the other 
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2-bit optimal 
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2-bit benchmark 


0.0049 
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1.9211 



TABLE II 

2-BIT SYMMETRIC QUANTIZATION : MUTUAL INFORMATION (IN BITS PER CHANNEL USE) ACHIEVED BY THE BENCHMARK 

SCHEME, COMPARED AGAINST THE OPTIMAL SOLUTIONS. 
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Fig. 5. 2-bit symmetric quantization : optimal input distribution and quantizer at various SNRs (the dashed vertical lines depict 
the locations of the quantizer thresholds). 



hand, for SNR of dB or above, the capacity is better than 95% of the optimal. These results are 
encouraging from a practical standpoint, given the ease of implementing the benchmark scheme. 

Optimal Input Distributions: It is interesting to examine the optimal input distributions (given 
by the cutting-plane algorithm) corresponding to the optimal quantizers obtained above. Figure [5] 
shows these distributions, along with optimal quantizer thresholds, for different SNRs. The solid 
vertical lines show the locations of the input distribution points and their probabilities, while 
the quantizer thresholds are depicted by the dashed vertical lines. As expected, binary signaling 
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is found to be optimal for low SNR, since it would be difficult for the receiver to distinguish 
between multiple input points located close to each other. The locations of the constellation points 
for the binary input are denoted by {— a;i, a;i} in the dB plot in Figure [51 The number of mass 
points increases as SNR is increased, with a new point (denoted xq) emerging at 0. On increasing 
SNR further, we see that the points {— a;i, Xi} (and also the quantizer thresholds {—q, q}) move 
farther apart, resulting in increased capacity. Finally, when the SNR becomes enough that four 
input points can be disambiguated, the point at disappears, and we get two new points shown 
at {— ^2, X2}. The eventual convergence of this 4-point constellation to uniform PAM with mid- 
point quantizer thresholds (i.e., the benchmark scheme) is to be expected, since the benchmark 
scheme approaches the capacity bound of two bits at high SNR. It is worth noting that the 
optimal inputs we obtained all have at most four points, even though Proposition 2 (section 
IIII-B|) is looser, guaranteeing the achievability of capacity by at most five points. 

B. 3-bit Symmetric Quantization 

For 3-bit symmetric quantization, we need to optimize over a space of 3 parameters : {0 < 
qi < q2 < qs}, with the quantizer thresholds being {±gi, ±^2, ±93 }• Since brute force search is 
computationally complex, we investigate an alternate iterative optimization procedure for joint 
optimization of the input and the quantizer in this case. Specifically, we begin with an initial 
quantizer choice Qi, and then iterate as follows (starting at z = 1) 

• For the quantizer Qi, find an optimal input. Call this input Fj. 

• For the input Fi, find a locally optimal quantizer, initializing the search at Qi. Call the 
resulting quantizer Qi+i- 

• Repeat the first two steps with i = i + 1. 

We terminate the process when the capacity gain between consecutive iterations becomes less 
than a small threshold e. 

Although the input-output mutual information is a concave functional of the input distribution 
(for a fixed quantizer), it is not guaranteed to be concave jointly over the input and the quantizer. 
Hence, the iterative procedure is not guaranteed to provide an optimal input-quantizer pair in 
general. A good choice of the initial quantizer Qi is crucial to enhance the likelihood that it 
does converge to an optimal solution. We discuss this next. 
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TABLE III 

3-BIT SYMMETRIC QUANTIZATION : MUTUAL INFORMATION (IN BITS PER CHANNEL USE) ACHIEVED BY THE BENCHMARK 

SCHEME, COMPARED AGAINST THE OPTIMAL SOLUTIONS. 



High SNR Regime: For high SNRs, we know that the uniform PAM with mid-point quantizer 
thresholds (i.e., the benchmark scheme) is nearly optimal. Hence, this quantizer is a good choice 
for initialization at high SNRs. The results we obtain indeed demonstrate that this initialization 
works well at high SNRs. This is seen by comparing the results of the iterative procedure with 
the results of a brute force search over the quantizer choice (similar to the 2-bit case considered 
earlier), as both of them provide almost identical capacity values. 

Lower SNRs: For lower SNRs, one possibility is to try out different initializations Qi. However, 
on trying out the benchmark initialization at some lower SNRs as well, we find that the iterative 
procedure still provides us with near optimal solutions (again verified by comparing with brute 
force optimization results). While our results show that the iterative procedure (with benchmark 
initialization) has provided (near) optimal solutions at different SNRs, we leave the question of 
whether it will converge in general to an optimal solution or not as an open problem. 

Comparison with the Benchmark: The efficacy of the benchmark initialization at lower SNRs 
suggests that the performance of the benchmark scheme should not be too far from optimal at 
small SNRs as well. This is indeed the case, as shown in Table HIH At dB SNR, for instance, the 
benchmark scheme achieves 98% of the capacity achievable with an optimal quantizer choice. 

Optimal Input Distributions: The optimal input distributions and quantizers (obtained using 
the iterative procedure) are depicted in Figure [6l Binary antipodal signaling is optimal at low 
SNRs (not shown). Increase in the SNR first results in a new mass point at 0, and subsequently 
in a 4-point constellation. The trend is repeated, with the number of mass points increasing with 
SNR, till we get an 8-point constellation which eventually moves towards uniform PAM, and 
the capacity approaches three bits. 
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Fig. 6. 3-bit symmetric quantization : optimal input distribution and quantizer at various SNRs (the dashed vertical lines depict 
the locations of the quantizer thresholds). 



Again, the optimal input distributions obtained have at most K points (K = 8), while 
Proposition 2 in section Ull-B I provides the looser guarantee that the capacity is achievable with at 
most Ji+1 points. Of course, the results above are for the particular cases when the quantizers are 
also optimal (among symmetric quantizers), whereas Proposition 2 holds for any quantizer choice. 
Thus, it is possible that there might exist a i^-level quantizer for which the capacity is indeed 
achieved by exactly A' + 1 points. We leave open, therefore, the question of whether the result in 
Proposition 2 can be tightened to guarantee the achievability of capacity with at most K points. 

C. Comparison with Unquantized Observations 

We now compare the capacity results for different quantizer precisions against the capacity with 
unquantized observations (depicted in Figure |7]). A sampling of these results is provided in Table 
HVl We observe that at low SNR, the performance degradation due to low-precision quantization 
is small. For instance, at -5 dB SNR, 1-bit receiver quantization achieves 68% of the capacity 
achievable with infinite-precision, while with 2-bit quantization, we can get as much as 90% of 
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Fig. 7. Capacity witli 1-bit, 2-bit, 3-bit, and infinite-precision ADC. 
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TABLE IV 
Impact of low-precision ADC : capacity (in bits per channel use) with different ADC precisions, compared 

WITH the unquantized (INFINITE-PRECISION) CASE. 



the infinite-precision limit. This is to be expected: if channel noise dominates the actual signal, 
increasing the quantizer precision beyond a point does not help much in distinguishing between 
different signal levels. The more surprising finding is that, even at moderately high SNRs, the 
loss due to low-precision sampling remains quite acceptable. For example, 2-bit quantization 
achieves 85% of the capacity attained using unquantized observations at 10 dB SNR, while 3-bit 
quantization achieves 85% of the unquantized capacity at 20 dB SNR. Encouraging results of a 
similar nature have been reported earlier in |l6l . However, the input alphabet there was kept fixed 
as binary to begin with, so that the good performance with low-precision receiver quantization 
is perhaps less surprising. 
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TABLE V 

SNR (IN DB) required to achieve a specified spectral efficiency with different ADC PRECISIONS. 



While the loss in spectral efficiency at fixed SNR is moderate, the loss in power efficiency at 
fixed spectral efficiency is significant (Table JV]). For example, if the spectral efficiency is fixed 
to that attained by an unquantized system at 10 dB (which is 1.73 bits/channel use), then 2-bit 
quantization incurs a loss of 2.30 dB. In practical terms, this penalty in power is more significant 
compared to the 15% loss in spectral efficiency on using 2-bit quantization at 10 dB SNR. This 
suggests, for example, that in order to weather the impact of low-precision ADC, a moderate 
reduction in the spectral efficiency is a better design choice than an increase in the transmit power. 



VI. Conclusions 

Our Shannon-theoretic investigation indicates that the use of low-precision ADC is a feasible 
option for designing future high-bandwidth communication systems. The choice of low-precision 
ADC is consistent with the overall system design goals for systems such as UWB and mm wave 
communication, where power is at a premium, due to regulatory restrictions as well as due to the 
difficulty of generating large transmit powers with integrated circuits in low-cost silicon processes 
(e.g., see ll38l for discussion of mm wave CMOS design). Power-efficient communication dictates 
the use of small constellations, so that the symbol rate, and hence the sampling rate, for a given 
bit rate must be high. This forces us towards using ADCs with lower precision, but fortunately, 
this is consistent with the use of small constellations in the first place for power-efficient design. 
Thus, if we plan on operating at low to moderate SNR, the small reduction in spectral efficiency 
due to low-precision ADC is acceptable in such systems, given that bandwidth is plentiful. 

There are several unresolved technical issues that we leave as open problems. While we 
show that at most K + I points are needed to achieve capacity for a _ftr-level quantizer, our 
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numerical results show that at most K points are needed. Can this be proven, at least for 
symmetric quantizers? Are symmetric quantizers optimal? Does our iterative procedure (with 
the benchmark initialization, or some other judicious initialization) for joint optimization of the 
input and the quantizer converge to an optimal solution in general? Are there other, provably 
optimal techniques with substantially lower complexity than brute force search to perform this 
joint optimization? 

A technical assumption worth revisiting is that of Nyquist sampling (which induces the 
discrete-time memoryless AWGN-Quantized Output channel model considered in this work). 
While symbol rate Nyquist sampling is optimal for unquantized systems in which the transmit 
and receive filters are square root Nyquist and the channel is ideal, for quantized samples, we 
have obtained numerical results that show that fractionally spaced samples can actually lead to 
small performance gains. A detailed study quantifying such gains is important in understanding 
the tradeoffs between ADC speed and precision. However, we do not expect oversampling to play 
a significant role at low to moderate SNR, given the small degradation in our Nyquist sampled 
system relative to unquantized observations (for which Nyquist sampling is indeed optimal) in 
these regimes. Of course, oversampling in conjunction with hybrid analog/digital processing (e.g., 
using ideas analogous to delta-sigma quantization) could produce bigger performance gains, but 
this falls outside the scope of the present model. 

While our focus in this paper was on non-spread systems, it is known that low-precision 
ADC is often employed in spread spectrum systems for low cost implementations ll39l . In our 
prior examination of Shannon limits for direct sequence spread spectrum systems with 1-bit 
ADC [9], we demonstrated that binary signaling was suboptimal, but did not provide a complete 
characterization of an optimal input distribution. The approach in the present paper implies that, 
for a spreading gain G, a discrete input distribution with at most G + 2 points can achieve 
capacity (although in practice, much smaller constellations would probably work well). 

Finally, we would like to emphasize that the Shannon-theoretic perspective provided in this 
paper is but a first step towards the design of communication systems with low-precision ADC. 
Major technical challenges include the design of ADC-constrained methods for carrier and timing 
synchronization, channel equalization, demodulation and decoding. 
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Appendix I : Achievability of Capacity 

Theorem 3: BOl . iHTl Let V be a real normed linear vector space, and V* be its normed dual 
space. 

(a) A weak* continuous real-valued functional / evaluated on a weak* compact subset JF of V* 
achieves its maximum on JF. 

(b) If in addition to part (a), JF is a convex subset, and / is a convex functional, then the 
maximum is achieved at an extreme point □ of JF. 

Proof: For part (a), see BOl p. 128, Thm 2]. Part (b) follows from the Bauer Maximum 
Principle (see, for example pTl p. 211]), which holds since the dual space V*, equipped with 
the weak* topology, is a locally convex Hausdorff space iHTl p. 205]. ■ 

The use of part (a) of the theorem to establish the existence of capacity-achieving input 
distributions is standard (see [|30ll . B2ll for details). To use this theorem for our channel model 
(dl), we need to show that the set JF of all average power constrained distribution functions is 
weak* compact, and the mutual information functional / is weak* continuous over JF, so that 
/ achieves its maximum on JF. The weak* compactness of JF follows by [|42l Lemma 3.1]. To 
prove continuity, we need to show that 

^ weak* ^ ^/ ^ X ^/ ^x 

Fn 'F =^ I{Fn) -^ I{F) 

The finite cardinality of the output for our problem trivially ensures this. Specifically, 

I{F) = Hy{F) - Hy\x{F) 

K ^ K 



^ R{y,- F) log R{y,- F) + f dF{x) J^ W^{x) log Wd 

i=l -^ i=l 



where. 



/oo 
Wi{x)dF{x). 
-oo 

The continuous and bounded nature of Wi{x) ensures that R{yi] F) is continuous (by the 

K 

definition of weak* topology). Moreover, the function V^ Wi{x) log Wi{x) is also continuous and 

bounded, implying that Hy\x{F) is also continuous (again by the definition of weak* topology). 
The continuity of I{F) thus follows. 

An extreme point of a convex set JF is a point tliat is not obtainable as a mid-point of two distinct points of JF. 
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Appendix II : Proof of Theorem 1 (Discrete Capacity- Achieving Distribution) 

The proof is along the same lines as Witsenhausen's proof in lIH, except that we have an 

additional average power constraint on the input. 

Proof: Let S be the set of all average power constrained distributions with support in 

the interval [y4i,y42]. The required capacity, by definition, is C = sup/(X;F), where I{X]Y) 

s 
denotes the mutual information between X and Y . The achievability of the capacity is guaranteed 

by Theorem [3t a) in Appendix I. [l42l Lemma 3.1] ensures the weak* compactness of the set 
S, while weak* continuity of /(X; Y) is easily proven given the assumption that the transition 
functions Wi{x) are continuous. Let S* be a capacity-achieving input distribution. 

The key idea that we employ is a theorem by Dubins lEl, which characterizes extreme points 
of the intersection of a convex set with hyperplanes. We first give some necessary definitions, 
and then state the theorem. 

Definitions : 

• Let £^ be a vector space over the field of real numbers, and A^ be a convex subset of S. 
Ai is said to be linearly bounded (linearly closed) if every line intersects 7V1 in a bounded 
(closed) subset of the line. 

• Let f : £ ^ Whe a linear functional (not identically zero). The set {x G £ : f{x) = c} 
defines a hyperplane, for any real c. 

Dubins ' Theorem : Let A^ be a linearly closed and linearly bounded convex set and U be the 
intersection of Al with n hyperplanes, then every extreme point of W is a convex combination 
of at most n + 1 extreme points of Al. 

To apply Dubins' theorem to our problem, we begin by defining C[yli, A2] : the real normed 
linear space of all continuous functions on the interval [y4i,y42], with sup-norm. The dual of 
C[y4i,yl2] is the space of functions of bounded variations [|40l Sec 5.5], and it includes the 
(convex) set of all distribution functions with support in [741,^42]. We take £ to be the dual of 
C[y4i, A2], and 7VI to be the subset of £ consisting of all distribution functions with support in 
[Ai, A2]. Note that the optimal input distribution S* e M. 

Let the probability vector of the output Y, when the input is S*, be R* = {pi*,P2*, -^Pk*}- 
Also, let the average power of the input under the distribution S* be Pq, where Pq < P. 
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Now, consider the following subset U of M 

U = {M G M\R{y; M) = R* and E{X^) = Pq}. (12) 

The set U is the intersection of the set A^ with the following K hyperplanes 

Hi : / Wi{x)dM{x) =pi* l<i<K-l (13) 

Jai 

and, 

„2 



Hk : / x'dM{x) = Po (14) 

Jai 

where Wi{x) are the transition probability functions. Note that there are only K — 1 hyperplanes 
in (fT3l) since the probabilities must sum to 1, thus making the requirement on px* redundant. 

We know that the set A^ is compact in the weak* topology ll42l Lemma 3.1]. Also, each of 
the hyperplanes Hi, 1 < i < K — 1, is a closed set since the functions Wi{x) are continuous. 
The hyperplane Hk is closed as well, since x^ is a continuous function. Therefore, the set U, 
being the intersection of a weak* compact set with K closed sets, is weak* compact. It is easy 
to see that U is a convex set as well. On the set U, we have 

I{X-Y) = H{Y)-H{Y\X) 

K pA2 



f-A2 

-"^p*\ogp*+ dM{x)Y^Wi{x)\ogWi[ 



x). 



As a function of the distribution M(-), we get 

/(X; Y) = constant + linear , 

K 

and the linear part is weak* continuous since >^ Wi{x) \ogWi{x) is in C[Ai, A2]. 

i=l 

It follows from Theorem [3]^b) in Appendix I that the continuous linear functional /(X; Y) 
attains its maximum over the compact convex set hi at an extreme point of U. However, since 
S* E U, any maxima over U is a maxima over S as well. Hence, the required capacity is 
achieved at an extreme point of U. 

We now apply Dubins' theorem to characterize the extreme points of U. Since U is the 
intersection of M with K hyperplanes, every extreme point of W is a convex combination 
of at most K + 1 extreme points of A^. The extreme points of A^ however are distributions 
concentrated at single points within the interval [^1,^2]. Therefore, we get that the required 
capacity is achievable by a discrete distribution with at most K + 1 points of support. ■ 
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Fig. 8. The second derivative of h[Q{^)) is positive everywhere. 



Appendix III : Convexity of the Function h{Q{^)) 

To show convexity, we verify that the second derivative of the function h(Q(y/y)) is positive 
everywhere. For y > 2, we do this analytically, while for < y < 2, the positivity of the second 
derivative is demonstrated numerically in Figure [8l 

Let u{y) = h{Q{y/y)). Then, 



-J//2 



■In 



Qiyy) ] 



u'{y) 



Note that ^, ^if > l,Vy > 0. Therefore, to show that the second derivative u"{y) is positive. 



QiVy) 



it suffices to show that the function v{y) = e ^^^In 
Taking the derivative of f (y), we get 



2v/2^1n2"^V QiVy) J 

nc 
is a decreasing function of y. 



i-Q(v^) 
QiVy) 



v'{y) 



-y/2 



In 



QiVy) ] 



-y/2 



Qi^) J V2^Q{^){1-Q{^))_ 



To show that v{y) is decreasing, it suffices to show that 



In 



QiVy) \ 



-J//2 



> 



Q(^) ; - v/2^ g(v/y)(i - QiVy)) 



(15) 



Using the fact [|28l pp. 78] that Q{y) > (1 - 4)M=^, we get that if y > 1, then the following 



condition is sufficient for (fT5l) to be true 

'i- QiVy) \ 



In 



r^ yV2T7 



> 



Q{^) J - (i-i)(i-g(^)) 



(16) 
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or, equivalently 



i... .. _.. ii' Q{Vv) \ 



The left hand side of (flTl) is a monotone increasing function of y. For y = 2, it equals 1.133. 
Thus (fTTI) holds Vy > 2, and hence the second derivative of h{Q{y/y)) must be positive for 

y>2. 
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